315 research outputs found
Self-Attention Networks for Connectionist Temporal Classification in Speech Recognition
The success of self-attention in NLP has led to recent applications in
end-to-end encoder-decoder architectures for speech recognition. Separately,
connectionist temporal classification (CTC) has matured as an alignment-free,
non-autoregressive approach to sequence transduction, either by itself or in
various multitask and decoding frameworks. We propose SAN-CTC, a deep, fully
self-attentional network for CTC, and show it is tractable and competitive for
end-to-end speech recognition. SAN-CTC trains quickly and outperforms existing
CTC models and most encoder-decoder models, with character error rates (CERs)
of 4.7% in 1 day on WSJ eval92 and 2.8% in 1 week on LibriSpeech test-clean,
with a fixed architecture and one GPU. Similar improvements hold for WERs after
LM decoding. We motivate the architecture for speech, evaluate position and
downsampling approaches, and explore how label alphabets (character, phoneme,
subword) affect attention heads and performance.Comment: Accepted to ICASSP 201
Transformation Based Interpolation with Generalized Representative Values
Fuzzy interpolation offers the potential to model
problems with sparse rule bases, as opposed to dense rule
bases deployed in traditional fuzzy systems. It thus supports the
simplification of complex fuzzy models and facilitates inferences
when only limited knowledge is available. This paper first
introduces the general concept of representative values (RVs),
and then uses it to present an interpolative reasoning method
which can be used to interpolate fuzzy rules involving arbitrary
polygonal fuzzy sets, by means of scale and move transformations.
Various interpolation results over different RV implementations
are illustrated to show the flexibility and diversity of this
method. A realistic application shows that the interpolation-based
inference can outperform the conventional inferences
Fuzzy interpolative reasoning via scale and move transformation
Interpolative reasoning does not only help reduce the
complexity of fuzzy models but also makes inference in sparse
rule-based systems possible. This paper presents an interpolative
reasoning method by means of scale and move transformations. It
can be used to interpolate fuzzy rules involving complex polygon,
Gaussian or other bell-shaped fuzzy membership functions. The
method works by first constructing a new inference rule via
manipulating two given adjacent rules, and then by using scale
and move transformations to convert the intermediate inference
results into the final derived conclusions. This method has three
advantages thanks to the proposed transformations: 1) it can
handle interpolation of multiple antecedent variables with simple
computation; 2) it guarantees the uniqueness as well as normality
and convexity of the resulting interpolated fuzzy sets; and 3) it suggests
a variety of definitions for representative values, providing
a degree of freedom to meet different requirements. Comparative
experimental studies are provided to demonstrate the potential of
this method
Fuzzy interpolation with generalized representative values
Fuzzy interpolative reasoning offers the potential to model problems using sparse rule bases, as opposed to dense rule bases deployed in traditional fuzzy systems. It thus supports the simplification of complex fuzzy models in terms of rule number and facilitates inferences when limited knowledge is available. This paper presents an interpolative reasoning method by means of scale and move transformations
Scale and move transformation-based fuzzy interpolative reasoning:A revisit
This paper generalises the previously proposed
interpolative reasoning method 151 to cover interpolations involving
complex polygon, Gaussian or other bell-shaped fuzzy
membership functions. This can be achieved by the generality
of the proposed scale and move transformations. The method
works by first constructing a new inference rule via manipulating
two given adjacent rules, and then by using scale and move
transformations to convert the intermediate inference results into
the final derived conclusions. This generalised method has two
advantages thanks to the elegantly proposed transformations: I)
It can easily handle interpolation of multiple antecedent variables
with simple computation; and 2) It guarantees the uniqueness as
well as normality and convexity of the resulting interpolated fuzzy
sets. Numerical examples are provided to demonstrate the use of
this method
Deep Captioning with Multimodal Recurrent Neural Networks (m-RNN)
In this paper, we present a multimodal Recurrent Neural Network (m-RNN) model
for generating novel image captions. It directly models the probability
distribution of generating a word given previous words and an image. Image
captions are generated by sampling from this distribution. The model consists
of two sub-networks: a deep recurrent neural network for sentences and a deep
convolutional network for images. These two sub-networks interact with each
other in a multimodal layer to form the whole m-RNN model. The effectiveness of
our model is validated on four benchmark datasets (IAPR TC-12, Flickr 8K,
Flickr 30K and MS COCO). Our model outperforms the state-of-the-art methods. In
addition, we apply the m-RNN model to retrieval tasks for retrieving images or
sentences, and achieves significant performance improvement over the
state-of-the-art methods which directly optimize the ranking objective function
for retrieval. The project page of this work is:
www.stat.ucla.edu/~junhua.mao/m-RNN.html .Comment: Add a simple strategy to boost the performance of image captioning
task significantly. More details are shown in Section 8 of the paper. The
code and related data are available at https://github.com/mjhucla/mRNN-CR ;.
arXiv admin note: substantial text overlap with arXiv:1410.109
- …